Skip to content

[Build] fix file_data_loader.cpp build issues for windows#4899

Draft
python3kgae wants to merge 4 commits intopytorch:mainfrom
python3kgae:win_bld_file
Draft

[Build] fix file_data_loader.cpp build issues for windows#4899
python3kgae wants to merge 4 commits intopytorch:mainfrom
python3kgae:win_bld_file

Conversation

@python3kgae
Copy link
Contributor

@python3kgae python3kgae commented Aug 26, 2024

Two Windows build issues have been addressed in this pull request:

  1. Avoid closing when fd_ is -1, as this would cause a crash on Windows.
  2. Introduce file.h, enabling Windows builds to utilize io.h and implement pread.

For #4661

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4899

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 3 Awaiting Approval

As of commit bc600ce with merge base a79b1a6 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 26, 2024
@kirklandsign
Copy link
Contributor

Thank you @python3kgae !

I would prefer that we use a single file.h, and check for ifdef _WIN32. When we use #include please use the full path starting from executorch/... , not adding a include_dir in cmake. Or at least, using a file.h which selects the correct file_posix.h and file_windows.h based on compiler macro. We also maintain an internal build system and it's easier that way.

cc @dbort

Two build issues have been addressed in this pull request:

1. Avoid closing when fd_ is -1, as this would cause a crash on Windows.
2. Introduce separate headers for Windows and Unix, enabling Windows builds
   to utilize a distinct header and implement pread.

For pytorch#4661
@python3kgae
Copy link
Contributor Author

Thank you @python3kgae !

I would prefer that we use a single file.h, and check for ifdef _WIN32. When we use #include please use the full path starting from executorch/... , not adding a include_dir in cmake. Or at least, using a file.h which selects the correct file_posix.h and file_windows.h based on compiler macro. We also maintain an internal build system and it's easier that way.

cc @dbort

Changed into single file.h.

@kirklandsign
Copy link
Contributor

Thank you! One more request: in extension/data_loader/targets.bzl line 43, can you add a line

headers = ["file.h"],

@kirklandsign kirklandsign requested a review from dbort August 28, 2024 17:22
@python3kgae
Copy link
Contributor Author

Thank you! One more request: in extension/data_loader/targets.bzl line 43, can you add a line

headers = ["file.h"],

Like this?

@@ -40,6 +40,7 @@ def define_common_targets():
     runtime.cxx_library(
         name = "file_data_loader",
         srcs = ["file_data_loader.cpp"],
+        headers = ["file.h"],
         exported_headers = ["file_data_loader.h"],
         visibility = [
             "//executorch/test/...",

@kirklandsign
Copy link
Contributor

Thank you! One more request: in extension/data_loader/targets.bzl line 43, can you add a line

headers = ["file.h"],

Like this?

@@ -40,6 +40,7 @@ def define_common_targets():
     runtime.cxx_library(
         name = "file_data_loader",
         srcs = ["file_data_loader.cpp"],
+        headers = ["file.h"],
         exported_headers = ["file_data_loader.h"],
         visibility = [
             "//executorch/test/...",

Yes

@python3kgae
Copy link
Contributor Author

Thank you! One more request: in extension/data_loader/targets.bzl line 43, can you add a line

headers = ["file.h"],

Like this?

@@ -40,6 +40,7 @@ def define_common_targets():
     runtime.cxx_library(
         name = "file_data_loader",
         srcs = ["file_data_loader.cpp"],
+        headers = ["file.h"],
         exported_headers = ["file_data_loader.h"],
         visibility = [
             "//executorch/test/...",

Yes

Done.

Copy link
Contributor

@dbort dbort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we try to avoid using platform-based #ifdefs for implementations, and will create separate .cpp files that the build system brings in as necessary. But this is pretty light-weight and self-contained, so it seems fine.

@@ -0,0 +1,57 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please call this file pread.h to make its purpose more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree.
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment describing the purpose of this header: i.e., it ensures that a pread-compatible function is defined in the global namespace for windows and posix environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


#include <windows.h>

inline ssize_t pread(int __fd, void* __buf, size_t __nbytes, size_t __offset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the leading underscores from the param names. Since this is our code and not part of the standard library, we shouldn't define any names beginning with underscore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines +22 to +23
overlapped.Offset = __offset;
overlapped.OffsetHigh = __offset >> 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this do the right thing on both 32-bit and 64-bit architectures? Seems like it should, but I don't know enough about the definition of DWORD on different architectures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DWORD will always be 32 bits on both 32-bit and 64-bit Windows systems.

It's important to note that the nNumberOfBytesToRead parameter for ReadFile is a DWORD. Therefore, if we need to read more than 4GB of data at once, the current implementation needs to be updated to handle larger file sizes.

Comment on lines +54 to +56
// To avoid conflicts with std::numeric_limits<int32_t>::max() in
// file_data_loader.cpp.
#undef max
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to the top of the file, closer to the header that defined it. If you know, please mention which header defined it, since on its own it's not clear why this file should bother undefining a macro that it didn't define.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@shoumikhin
Copy link
Contributor

@tarun292, @kirklandsign can you take a look please?

@github-actions
Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the stale PRs inactive for over 60 days label Aug 27, 2025
@github-actions
Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

2 similar comments
@github-actions
Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions
Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses native Windows build failures in the extension/data_loader file-backed loader by making fd cleanup safe on Windows and adding a Windows-compatible pread implementation.

Changes:

  • Add extension/data_loader/pread.h to provide a pread-compatible function on Windows (using Win32 APIs) while using the POSIX pread on non-Windows platforms.
  • Update FileDataLoader to include pread.h and remove the direct <unistd.h> dependency.
  • Avoid calling close() when fd_ == -1 in FileDataLoader’s destructor (moved-from safety on Windows).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
extension/data_loader/targets.bzl Adds pread.h as a private header for the file_data_loader target.
extension/data_loader/pread.h Introduces a cross-platform pread shim with a Windows implementation.
extension/data_loader/file_data_loader.cpp Switches to the new pread shim and avoids closing -1 fds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +16 to +26
#include <executorch/runtime/platform/compiler.h> // For ssize_t.
#include <io.h>

#include <windows.h>
// To avoid conflicts with std::numeric_limits<int32_t>::max() in
// file_data_loader.cpp.
#undef max

inline ssize_t pread(int fd, void* buf, size_t nbytes, size_t offset) {
OVERLAPPED overlapped; /* The offset for ReadFile. */
memset(&overlapped, 0, sizeof(overlapped));
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pread.h isn't self-contained on Windows: it uses memset, errno, and EIO but doesn't include headers that declare/define them. Since file_data_loader.cpp includes this header before <cstring>/<cerrno>, this can cause Windows compile errors. Add the appropriate standard headers (e.g., <cstring> and <cerrno>/<errno.h>).

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +28
overlapped.Offset = offset;
overlapped.OffsetHigh = offset >> 32;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overlapped.OffsetHigh = offset >> 32; can be undefined behavior on 32-bit builds because offset is a size_t (likely 32 bits) and shifting by 32 is UB. Consider first widening to uint64_t and then assigning the low/high 32-bit parts with explicit casts to DWORD to avoid UB and narrowing warnings.

Suggested change
overlapped.Offset = offset;
overlapped.OffsetHigh = offset >> 32;
ULONGLONG offset64 = (ULONGLONG)offset;
overlapped.Offset = (DWORD)offset64;
overlapped.OffsetHigh = (DWORD)(offset64 >> 32);

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +35
BOOL result; /* The result of ReadFile. */
DWORD bytes_read; /* The number of bytes read. */
HANDLE file = (HANDLE)_get_osfhandle(fd);

result = ReadFile(file, buf, nbytes, &bytes_read, &overlapped);
DWORD error = GetLastError();
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadFile takes a DWORD byte count, but this wrapper passes size_t nbytes directly. On 64-bit this can silently truncate (and may warn/error under -Wconversion/-Werror). Add an explicit bounds check (e.g., nbytes <= MAXDWORD) and cast to DWORD before calling ReadFile; return -1 with an appropriate errno if the request is too large.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +54
break;
default:
errno = EIO;
break;
}
return -1;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ERROR_HANDLE_EOF case currently sets errno = 0 and returns -1. POSIX pread signals EOF by returning 0 (not an error). Returning -1 here can lead to misleading logs like strerror(0) ("Success") and incorrect error handling. Return 0 for this case to match pread semantics.

Suggested change
break;
default:
errno = EIO;
break;
}
return -1;
return 0;
default:
errno = EIO;
return -1;
}

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +21
#include <windows.h>
// To avoid conflicts with std::numeric_limits<int32_t>::max() in
// file_data_loader.cpp.
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, including <windows.h> without NOMINMAX typically defines both min and max macros. This header only #undef max, but file_data_loader.cpp uses std::min, which can be broken by the min macro. Consider defining NOMINMAX before including <windows.h> and/or #undef min alongside #undef max here.

Suggested change
#include <windows.h>
// To avoid conflicts with std::numeric_limits<int32_t>::max() in
// file_data_loader.cpp.
#ifndef NOMINMAX
#define NOMINMAX
#endif
#include <windows.h>
// To avoid conflicts with std::min/std::max and std::numeric_limits<int32_t>::max()
// in file_data_loader.cpp.
#undef min

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. stale PRs inactive for over 60 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants